7 research outputs found
LLMs as Counterfactual Explanation Modules: Can ChatGPT Explain Black-box Text Classifiers?
Large language models (LLMs) are increasingly being used for tasks beyond
text generation, including complex tasks such as data labeling, information
extraction, etc. With the recent surge in research efforts to comprehend the
full extent of LLM capabilities, in this work, we investigate the role of LLMs
as counterfactual explanation modules, to explain decisions of black-box text
classifiers. Inspired by causal thinking, we propose a pipeline for using LLMs
to generate post-hoc, model-agnostic counterfactual explanations in a
principled way via (i) leveraging the textual understanding capabilities of the
LLM to identify and extract latent features, and (ii) leveraging the
perturbation and generation capabilities of the same LLM to generate a
counterfactual explanation by perturbing input features derived from the
extracted latent features. We evaluate three variants of our framework, with
varying degrees of specificity, on a suite of state-of-the-art LLMs, including
ChatGPT and LLaMA 2. We evaluate the effectiveness and quality of the generated
counterfactual explanations, over a variety of text classification benchmarks.
Our results show varied performance of these models in different settings, with
a full two-step feature extraction based variant outperforming others in most
cases. Our pipeline can be used in automated explanation systems, potentially
reducing human effort
PEACE: Cross-Platform Hate Speech Detection- A Causality-guided Framework
Hate speech detection refers to the task of detecting hateful content that
aims at denigrating an individual or a group based on their religion, gender,
sexual orientation, or other characteristics. Due to the different policies of
the platforms, different groups of people express hate in different ways.
Furthermore, due to the lack of labeled data in some platforms it becomes
challenging to build hate speech detection models. To this end, we revisit if
we can learn a generalizable hate speech detection model for the cross platform
setting, where we train the model on the data from one (source) platform and
generalize the model across multiple (target) platforms. Existing
generalization models rely on linguistic cues or auxiliary information, making
them biased towards certain tags or certain kinds of words (e.g., abusive
words) on the source platform and thus not applicable to the target platforms.
Inspired by social and psychological theories, we endeavor to explore if there
exist inherent causal cues that can be leveraged to learn generalizable
representations for detecting hate speech across these distribution shifts. To
this end, we propose a causality-guided framework, PEACE, that identifies and
leverages two intrinsic causal cues omnipresent in hateful content: the overall
sentiment and the aggression in the text. We conduct extensive experiments
across multiple platforms (representing the distribution shift) showing if
causal cues can help cross-platform generalization.Comment: ECML PKDD 202
Domain Generalization -- A Causal Perspective
Machine learning models rely on various assumptions to attain high accuracy.
One of the preliminary assumptions of these models is the independent and
identical distribution, which suggests that the train and test data are sampled
from the same distribution. However, this assumption seldom holds in the real
world due to distribution shifts. As a result models that rely on this
assumption exhibit poor generalization capabilities. Over the recent years,
dedicated efforts have been made to improve the generalization capabilities of
these models collectively known as -- \textit{domain generalization methods}.
The primary idea behind these methods is to identify stable features or
mechanisms that remain invariant across the different distributions. Many
generalization approaches employ causal theories to describe invariance since
causality and invariance are inextricably intertwined. However, current surveys
deal with the causality-aware domain generalization methods on a very
high-level. Furthermore, we argue that it is possible to categorize the methods
based on how causality is leveraged in that method and in which part of the
model pipeline is it used. To this end, we categorize the causal domain
generalization methods into three categories, namely, (i) Invariance via Causal
Data Augmentation methods which are applied during the data pre-processing
stage, (ii) Invariance via Causal representation learning methods that are
utilized during the representation learning stage, and (iii) Invariance via
Transferring Causal mechanisms methods that are applied during the
classification stage of the pipeline. Furthermore, this survey includes
in-depth insights into benchmark datasets and code repositories for domain
generalization methods. We conclude the survey with insights and discussions on
future directions